AN APPARATUS FOR PROVIDING DIRECT DATA PROCESSING 
ACCESS USING A QUEUED DIRECT INPUT-OUTPUT DEVICE 

FIELD OF INVENTION 

The subject of the present invention in general pertains to 
5 a new Input-Output facility design that exploits high bandwidth 
integrated network adapters . 

O BACKGROUND OF THE INVENTION 

!J1 In a network computing environment, multitudes of 

^ commands and requests for retrieval and storage of data are 
lii processed every second. To properly address the complexity of 
routing these commands and requests, environments with servers 
have traditionally offered integrated network connectivity to 
ry allow direct attachments of clients such as Local Area Networks 

(LANs). Given the size of most servers, the number of clients 
L§ usually is in the range of hundreds to thousands and the 

bandwidth required in the 10-100 Mbits/sec range. However, in 
recent years the servers have grown and the amount of data they 
are required to handle has grown with them. As a result, the 
existing I/O architectures need to be modified to support this 
2 0 order of magnitude increase in the bandwidth. 

In addition, new Internet applications have increased the 
demand for improved latency. The adapters must support a larger 
number of users and connections to consolidate the network 
interfaces which are visible externally. The combination of all 
25 the above requirements presents a unique challenge to server I/O 
subsystems . 
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rh as International 

Business Machines ^^-^^^^ / „.ae.ar. of 

systeo. Archltecture/390 tio„) , there are 

xlternational Business «-^--;°7„ ^^......^ „ust regain 

additional requirements that the ^^^^.^^^ „„tinue to 

consistent with existing support. PP ,„„£,,uratlon must 

,un unmodified, and error ^-''"/J^ resources must be 

be preserved or even J'J J ,,,, being sent or 

enabled as well - - -e^^ ^ ...uenges that need to 

received. This presents new 

be resolved. 

v,-^h ;,re dramatically higher 
" order to ;;-rdtarn:e:^ new system 

and still achieve other requir 
architecture is needed. 

/ . w^ma filed on the same day as the 

This ap/lication is being ^^^,^^3^ po9-99-015, 

r.rz r;.-": r.r.»: - 

P09-99/031. 
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rtiract data processing access In 
in apparatus for providing drrect environment 
. networ. computing system -v-o-n- T V_^^ ^^^^ 

a mam storage ""^ ^^^r^L^, eo^unlcation with an 
application servers and s n P ^^^^^^^ 

interface element. The in application user(s). 

adapter and can be ""-^f ^^J/^ main storage that can 

one or more feLupts m the running programs, 

handle data without causing 

2 
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Xnco^m, -ata 1= received us.n, «e P_^^^^^^^ ^yste™ 
received or modified, '"^J'^^^j;,,,, or change. Data is then 
„U1 be updated to reflect the new „„i,iple 
processed in the .aln -""^^ f ,,„„l«neously and forward.n, 
existing queues In the ■"-";^°f;,,3,,„,uon or application 
then, m turn to "/"f/^t.^ „ade by Interrogating these 

server after a determination has 

queues . 

ded as the invention is 
The suWect matter J^f/ ,,,,„ed In the concluding 

1 r^«inted out and distincriy ^ 
particularly pointed o invention, however, botn a 

r;rj::"rr ^^^^^^^^ 
rrrr/iCTi.:^:... ..... — 

The accompanying drawings In whrch. 

a network computing 
Lulirrrnrsryste. and a control unit, 
environment utilizing 

r.-F a network computing 
ngure 2 is an i"-"^;^™ "'^/^e present Invention, 
.„.lronment as per -;-;;:rs r.nannll and control unit 
,,,„e Shows now t-;- _ .he Interface 

functions while Figure 
element; 

. - Queuing mechanism as per 
ngure 3 is an illustration of a queui g 

3 one indention o. the present invention, 

Pigure 4 illustrates SETUP SDU fields; 
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■ / . for the command request 

/the format for tj?« 

, . U.3«a.o„ o. ..a. ...... 

.= oer one embodiment of tne p 
'"^"^ . of tne contents of output 

..,u.e B . a ta.u.a. ^r^Lention, 
'0 L ner one embodiment of the p 

1 '^'''^ . Of a .eue information .oc. content 

^ ngure 9 is an --P-;;^,^ ...ention. 

as per one embodiment of the P 

5 

Figure 10 is an example of a 

I at the present invention; 
i- embodiment of tne p 

it . of a SLIBE block content as per one 

Figure 11 is an example of a 
•S^ I of the present invention; 

- embodiment of the P 

T ist content as pei- 
. , is an example o£ a Storage L.st 

. o the present invention, 
embodiment of the p 

, . SBALE content as per one 
. . 13 is an example of a SBALE 
. o the present invention; and 
embodiment o£ tne p 

amole o£ a storage-List-State-BlocK 
Figure 14 is an ^■'ample o£ invention, 
content as per one embodiment o£ 
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, »„ existing data processing system 
An example of an existing ^^^^ p^^^^^ 1, 

architecture is n.i. storage 110, and one or 

information is passed between the m ^^^^^^^ ^^,„, 

::re input/output devices ^'^^^^ ^^^^l^,^ ,,,, channel paths 
lannel subsystems 150 .^--fj^'^ one or more control 

information. 

.he main storage UO stores data a^ -r/ctrireLable 
■ ^« iqn Main storage 
input from I/O devices 190. „ central 

J provides for high speed . one example of a 

processing units and one or mo« ^'O ^ „ea (not 

™ i« a customer's storage a .4.„.. information 

main storage is a c information or store int 

shovn) . I/O devices 190 recei 

in main storage. Some ^ " .irect-access storage 

readers and punches, -^^netic t P .^ieprocessing 
■ (DASD), displays, keyboards, P j equipment, 

devices (DRiU), „„,,o\lers and sensor-basea 

^,,ices, communication controller 

. .he Storage Control Element 
.he main storage is ^^'^^'^^^^ ^^l'^ Ire central 
,SCE, 120 which in turn IS coup ing unit(s) is 
processing units (CPU) "O;^ f ;„,,,,in, system and typically 
L control center of ^^^^J^^^ ,,,iii.ies for instruction 
comprises sequencing and P";-/ „lated functions, 
execution, initial program ^.^ , bi-directional or 
._n,. r^oiinled to the boc- 
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rjcution, initial program r;i rrbi-dlrectlonal or 

,ne CPO is usually coupled to the SC ^^^^^^^^ ,,,,„,ion and 

„ni-directlonal bus. ^^^'J channel subsystem, is 
queuing of requests made by the C ^^^^^^^ ^^^^^^^^^ 

loupled to the main storage, CPUs 
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different busses. 

.v,^ flow of information 
. «i subsystem directs the tJ. 
The channel suDsya relieves the cru 

tasl. of co«.unlcatxng V ^^^^^^^ 

Mocesslng operations _,^tions. The channel 

' .„vlv with I/O processing operat ^onmunlcatlon 

=°"'"r "uses one or more channel paths as the ^^^^^^^ 

subsystem uses o information to or x 

UnKs m managing the now of i ,,,ated 

«i nath consists ot on« r-ontrol units. 

Each -^-"^\f subsystem, and one or more contr 
^ within the channel Y processor is also 

3 one preferred embodiment, a SAP 
S Jot the channel subsystem. 

= y part of tn -v^ip to have one 

X / 1 it is also possible to n 

lii ^ \ v,^ <,een in Figure 1, f;,hric (network of 

^\jV) lis can be seen j- switching fabric v 

15 switches) incluo J ^^^^^^^ „„it is lu 

O and the control unites) ^^^^^^^^ _ 

W ,la a .us to one o^ .ore 

S subchannel is the means by which ^^J^^^^ ,,„„al 

rovides information about "-"^"'/^^..^ation by executing 
provides this into ^^^g^ 

,0 processing units, tn i consists of interna 

X/0 instructions. The subchanne ^ ^^^^^^^ ^^^,„^ ord 

that contains information ^".'"^ ^^,,ce number, count, 

address, channel pat^ i-tif^^.^ ^^^^^^^ -':.:'T 

„ rir^rrnr^rariiab.^^^^^^^^^^^^^^^ 

" -Trinst:::rstrrsignate tne subcnannel 

executing I/O i^st- 



associated with the device 
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. . IS accomplished by 

• „ o£ input/output OP^"'"-"" subsystem and 

The execution of inP channel s ^ 

.he Oecocm, o. CC«s '-P— U^nlt the 

,„put/output devices^ * ^^^^^^^,3 the cont ^^^^ 

,s initiated when the ch command -^d 

eo^and --^-"^/J^ed chain o. T^^^s, and the 

control unlt.s,. achieve bandwldths which 

„,alned earlier, m orde g^it 
''^ '"^ UV higher and move Iron, 100 
are d««"=aUy hig^_^^^^^__ ,„provements 

technologies, a present 

. depicts the ne.0- e.lr-^^^ -n- s..-^"^ 

— . "-%rrir^ .V an ::rnt and a 

control -^"^^^^^ connector Ihterface 
300 along the P-^*" " aso components o£ 

»et«orK „a 260 "spectlvely^ J^^ P ^^^^ 

1 ^^str a^- - " ^^::^es a much 

i ""^rre o -e existing --";;7,;;:s:ing the need for 

Structure o .f.^ient system by ^^..^ns such as the 

master and more e^"; ,,,ulred functions ^^^^ 

,<rtressinq many of the ex eliminating the n 

multitudes of Channel co^ands.by 

e-htfinS . / 



.recessing steps. . 

„ depicted in the/ 240 ca ,„cy 

V^nuector W^-;^^^^ .hlch is used f ^^^^^^^^ 

SOO it P""=so?^. - cards. ^/"hereinafter STI bus 

c>\^-^^°r:rara seif-^-d ^-.-e 

device ^cn embodiment- 

Ai- 230) as used m 
(shown /at -^-J"' 
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*. +-0 the main 

J ..e Connecor 1 turn can ^ 

invention, ~nnectJ t^- wh.ch 

connected to - f „«s and other i„g 

shown at 220 sucfc as » „ element xs ^" P jjo via 

severs. W°r;:r;e"wor. Interface ^^^f^/rPeripherai 
.on«unicatio/«.tH the ^f;, shown at 250 as 

another dir/ct =^ ^^einafter PCI »« = ^^.ice 

ControUe/lntertace bus ^^^^^^ J.ocal storage 

• rtie emDodlment ol tn v ^^.^ors and some locai 
„sed one or more processor the 

'"^^'T; the retwor. Interface Element C .ppUcatlon 
"^""A interface Element is "nnected to 
Setw/rH interi Notes 
use/s depicted at 

br/wsers. ^^^^^^ j„„ servers 

7 .ta streams ^ - rrsirr 
- Tirs:r.rre^ Pi-utv o. 

»d stor^ o. z:::t:^-^:z^^^ 

of bypassing any ^° ^^tworK is then updated ^^^^^^^ 

— • r a^^p-- -rre-r^ated 

the Changes. On ^^.^^^ ,„^„es are „ server 

^"eruX t^ determine the -P-^^ri , "™ 
Tt t- data'needs to be sent to^ --^^^ ,,,„,„t to the 

-s also transmitted via the I ^^^^g and 

servers is also by est 

application users in the 
Interrogating the queues. 
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interrogating the g 

...nlsm needs to be expi ..ferenced to 

queuing „t ihvention is refer 

,ne queuing -chani» o« P ^^^^^^^^ ^X^/^oth may be 
as the Queued Direct output queues 

communication stacks. ^ 
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, are provided, the program 
.,ea wnen the QDIO mput ^"^f/ ,ueues by the 
'""rectlv aocess data placed .nto / source of 

can directly j^^e Element. "^^ ^^^^ an I/O 

adapter(s) of the 1 .^^^^ ^^^^3 originate ^^„^^,,<,. 

,ne data P^-^^ f/^ to v,hlch the ^^^^^^J .he 

device or network o ^^^^^s are P ^^^^ 

correspondingly, -n- t- QD ^^^^ Adapter. 
,„gram can transm- data ^^^^^^ r.rinternally 1>V 

into the appropriate P ^^^^^^^ ^e used 

the data P^-^^^^Vtlsmltted to one or more I/O 
the adapter or may he ^^^^^^^^ 

t are located In the program storage 

and are separate £ro ^^^^ provided * 

embodiment up to 240 q ^^^^ed to minimize ^"'^ ^ 

storage interface is also pr ^.^hanlsm provides 

,hoad Each queue set 1 preferred 

other overhead. (^ound queues! in one P 

— :trd - at .ast - -mr comprises 

^"TiraUo; rasslqned to at least one que- ^.^^ 

' .or input or ^---fr,-- ^^^^ ^ 

. one or more ^-apters J" ^ blocKs for ■ 

useable buffers and also ^^^^^^^ prior 
incomlng/outgoln, -t needs, .t ^'^-^'^^^ are 

^^"^^%r;«-n desired or a change - -^;;::;„:aUy static 
= Bubsequent iv « ^^^^^^,,,„„,„ . Queues ar 

-rr a^;^^:rrem-"-^^' 

30 reflect the latest nature 



PO9-99-014 



10 



toraqe is used 

QDIO adapter, a structures, called queu ^^^ide the 

multiple separate data st characteristxcs and Pr 

collectively descr.be the q ^^^^^^^^ between 
necessary controls to alio 
program and the adapter. 

. .tatus block is established to re 
A Queuing status u /q activity status. 

.,na.ically as per the ownership in the channe. 

. lp.lse buffers ^,,30 gets updated as the p.c-e 

i tJthe r:.^ual syste..^-^^^ 

„.ere separate /^jLsigned a separate queue set 

virtual system can also 
queuing mechanism. 
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Sxchanae_of_Data 

a state change 

.ne pro,ra™ an. tne OO^^^^;^;;,::,, ex« o. data. 

input and output buffers 
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nd the adapter by placing the ^^^"^^ 'J'^^ iZ aside and 
program and the a" f «r.pcial location that 

hich are maintained in a -P^^^^^ ,3 tor input queues, 

which ar buffer. For examp adapter 

7:::TZ. receive. £ro. tne "-/f/^J/^^,.. .or eacn inpu. 
r/ters that are in the input -""^ ,,,,«r, the state ot 

Tu/fer that has data placed into .t by ^^^^^ 
. tutfer is changed Iro.. input (^^ch as round 

" The program then examines in -^^^ ,11 qdIO 

::::: -trsttte^. aii ---:rin r:-- r 
:r«:rrrd :rate. upon co.p--;:;r.„«er 

t:rr ;re-. ^-nrrrrrhet^r a^Ua.^^^^^^^^^^^ 
- inP- -"-^ ;:r//: re^uent input data "J" -;;;::: 
"htr Progra» changes the ---^^ 
riueue .u«ers «ro. pri«d to empty, i^ ^^^^^^^^ ^^^^^ 
r^ i— "'>::rnr::rrinput .ui^ers are now 
to signal the adapter that 

Similarly, .or output ^X"t ^a into one or 

o« the QOIO adapter, the P-J^jn:, in - output .u«er 

Tatrrr-- - --rrtar/err^u^er 
b^arrrtrrp^^^^^^^ 

^ , . ....emitted to the I/O devi adapter 



30 



rs^nrtrara^er that one^-ricrrac:^:; the adapter. 

V. 1-ransmitted to the I/O devi adapter 
data to be transmi ^ the program, the w 

--TsThe^" ---- :r:;r.i::. -n 
" ^"'^ " 
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h state of each 

suffer also has an ownership state 
^Additionally, each data bu«er ^^^^^^^ 

«nicn Kientmes --^ ^ ^ iHo. tn. - : . 

^^-r-ollinq element of tne d processing the buii 

controliiny managing and pru „r-i oritization 

element is responsible for provides for a prioriti 

11,, the queuing mechanism y ^^^^ 
Additionally, the q addresses are usea 
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v/ol structure overview for the 
ngure 3 depicts the contio ^ ^^^^ 3Ubchannel. 

,„put and output ^/^U components as -^-^ 

n,ure 3 also demonstrates J „tor.ation BlocX (QIB) 

the present ^"-''"""^I'X.out the collection o£ QWO ^ 
,t 310 contains Intormationr subchannel. It 

H output queues associat4d with a g ^^^^^^^ ,„^„es 

and output Aollection of mpu 

provides information subchannel. One QIB is 

r/the adapter associated with the ^^^^^^ 

onio subchaAnel; Figure P present 
defined per QDIO sud / embodiment of tn 

queue-information bloc*: P 

invention. / 4. -ioq 

/ , 1 (QTTB'* shown at 

i/st information Block (SLIB) ^^^h 
The Storage Lytst Int g^ored pertaining 

provides for the iddress of nfor^ ^^^^ ^^^^^^ 

queue, one SLi/ is defined for e ^ ^^^^^^ ^^^^^^^ 

fnformation f ^^/^ro^rrioc. en..- .^a-ng 
called storag^-l"t in ^^^h queue. 

i„tor»tlon /l.out each o£ the ^he present 

provides s/b format as per 

P09-99-0/4 
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j« list information blocK element 
invention. Furthermore a stoja^e ^^^^^^^^ .egardln, the 
« SUB. 7 -rrer^Xv -responom, s. entrv 
fZTn Tep'ot. a sample ZlBB content. 

/ ,t 330 defines the SEAL or 

.He Storage l-l.t or L shown at 33^^__^^ ^^^^ 

,.ora,e f>iocK address ^^atd «ith 

associated with «ch quiu- , ocate^ 

„.ich contains an ent^ *;;^„^,,,„ ,,out the I/O 
the queue. SL providefe embodiment of the pr 

"^^te: a-o lu/e storage ^^--;;;;jrte ad-esses of tne 
provides the J ^^.^^ ^ of absoi ^^^^^^^ 

list, in turn, SB«L ^3^,3 up one of the a 

storage bloclcs thai: -^^'^^'''^'^o.n at 340. A storage block 
associated with e/ch queue as show ^^^^ ,,,, 

ent^y or SBALE is also pr ^ 
address list enyy absolute storage a 

o.T Each SBAltE contains tne ^^^._e blocks addressed oy 

SEAL. Each f ^^^ly, the storage 

storage block. / CoH constitute one of 

ot the entries of a single SB ^ ^^^^^"^^,.,1 128. 

•hiP ODIO /buffers of a QUiu ^ buffers equal 

possible QDI / ^^^^^ ' °E as provided by one 

embodiment, tJhe n ^ SB^LE as p ^^ntain 

. ^« 13 orivides for the gB^^L Flags coi 

Figure 13 prj^ ^^^^^^ invention. ^^^^J SBAL 

embodiment ^f th P ^^^^^^ borage block 

inf ormatiot^ about the ^^^^^ ^^^^ storage 

containing each SBALE and not 3 ^^^^^^^^^^ of 
associated with each SBALE- T ^^^^^ ^^^^^^ 33,,. 
SBALE fie/d is <^i^^erent 

/ , . <;tsB is shown at 350. 

, /tora,e-Ust-State B^^^ "J^^^f^.^.e state information 
SI.SB cltams state in^""";;;' ,„,ue. * 0^° 
^° r E-rclu^tr ofstr^^^ :.oc.s that ^n . located 

PO9-9fl-014 
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. iorage-block-address 
of the addresses in a -^-fjl an SLSB entry, 

using all of th ^^^^^^^ ^""7 can change the state 

Ust. Depending on ^„,t can _^ 
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,i^ner th. P'^°5"'";/„io buftar by/tor.ng a new 
o< the correspona-, QO ^ 3a.pi^ SLSB t o^at as P ^ 

gutter i= owned by ^''-/'T output buffer. B.ts 

wnetne. tbe buffer ; an f P -"^t^Hr.e 

— > ' '"r„:" -"-ronrreip-. - 

the buffer, m I^„t configurations. ^ „^iie 

Wentif led to mean dif ff ^^^^ , „ buffer 

ho established! to inu . ownership ano u 

«to can be est j control unit ^^^^ 

hits 1 and 2 previa 7 ^.^^^^^^ a binau 

buffer 

type respectively. of the " 

indicates the o""/' f/.^, ^.ta storage), P"»t\ , or halted 
^ such as empty (V^--^^^^^, ,„ot ---;:rturi ' ''V 
processed ^^^^ "-^'"/"r (associated buffer 

'^"""LecXg Halt subchannel,, and;-; meaningful)- 
. program ^^^f'"^ contents of buffer 

IS in an er/or state 

, oBs are storage blocKs 
storage Blocl^« « f = buffer. 
. «iv to define a singj- 
collectively to ^^^^ 

30 exchange data d 
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more input queues 
„ ... p.o,ra» «.«u» nu..er of such 

.„.o. o..i. .-s -^,3 on ..e an. .oae. o. 

the adapter. command. 
Store_Subchannal_QMO_ 

• 1-hG main storage 
2, The program "^^^f ,,,p«r by use of an 

designates a QDI" 

. „f the establish, 
3, upon successful "--^^-^.^s the gueues at the 
,0X0 gueues command, the ^^o.^^^^^^^ ,„,o.guaues channel 
QDlo_q executing an actxv _ subchannel is 

="°rd"Tpo7irs successful ---^;„rthe .010-active 
command. uP ^annel-active state this, 

placed into the s used to ac 

state, ^g-i" \f "',i,e QDIO-gueues ""^"■'^'2„a when Start 
Alternatively, the -^"J ; QWO-gueues command 

"l trrcrted - the previous step, 
subchannel and 

, upon — - -rnsX:- to each other by 

the adapter can ^ ^''^r'^ns in a 

appropriate use of the g^^ ^^^^^^ ...ociated, 

subchannel, wit^ - and QMO-active state. 

sub-channel ^,,t the 

5, ^ny action that «uses a t« 
. «nd QDlO-active states „„^.ated with the 

subchannel.actrve and ^^^^^ ^^^„,s assocrat ^^^^ 

" stop e^amlni^ an^^P^^^^^ ^ ^^^^^^^ ,„,,,ated 
subchannel, i^i 
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a QDIO subchannel, an 

...t affeo.s .ne ^^^ f^/f/.^.^, penalng with aX«t-st=t- 
,,,ive subchannel to -"'^ ,,,,o„ initiated by the prog 

. or a teset/reconfiguration adapter to 

state, or r ^^^^^^^ „j the QD ^^g^e- 

^i-nath command tnax- u associated, 
channel patn subchannel is ass 

^1 nath to which a yui-j 
channel path ^^^^ ^^^^.^^ 

f the oresent invention provi stacks. 
The design of the pr „„itiole coBmum^^tion 

this device across multipi multiple 
share access to this ^^rtual guests and/or 

multiple priorities ™"^'^^^„,s. for .applng various 
logical partitions. * « ^^^^ „,„ocode rs 

.esources to ^i'^^"^^ ^^^Zl^ .iiocation and dynamic 
devised to °" <,ie point ot definition. This 

.VI on including single v ^.^.-ce to facilitate 

1 configuration, i ^^^^ interface 

, mechanise ^-^"^;/;j:„„,i^.ation parameters -O^^;^^^^ 

' -rr r:r:rriorof contra .... ;rross 
r:::- the 7-;— .tr:-- — 

.s the data comes in through tue ^Z^. -e 

assigned to it and in "J' -fJ^^ still operates in the 
.hannel subsystem in t- „e« man^r 

30 traditional mode for the , ^n interrupt free 

explained ^ound traffic has to allow for 

outbound traffic. The 
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it is not always obvious as 

rate. Hence, inbound xnter 



rates 



„ are processed by a QDIO adapter 

eot. input - --rs 

I „ .Ke iowest --"^^Jt: "Lst priority. 

S and tne bi^nest numbered ,ueue nas t 

a the adapter processes prrmed 

ft Por output ,„,put queue before 

% processing buffers 

ll o^tP^^ '^^^^^^ is dependent on 

5= =.^j,nter processing is a«F 

;0 3) For input queues, adapte P ^^^,,g,,ed. For 

^9 . of QDIO channel path to -^^-"^ .^.pter processes 

- rr:xir:-it. of the^, 
--^rdrir^^^^^^^^^ ^^^^^ 



20 placing 

associated priority 
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„ oependin. on - ^yP;^"- ,„eues, vice versa, or 
input cueues -V nave P"or^ V ^^^^^ 
- -nned priority may 

„ .or botn input and o'^^^-- ^ the 
ii a sequential round robxn manner 
processed in a » vi 
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„ called buffer 0, and continuing 
.„f£er associated with SEAL 0, oal ^^^^ buffer, is 

' .„ the input buffer empty 

.or input queues, each buf£er - ^^^^^^^ .^counters a 

3tate is se,uentlallV P--^ „ more ---- „ 

r rapferthen Usses the ^:.«ar 

r.-- - — n rf« : " ^t: . detected^ «hen 

ToHtrhr--- "-r :r /rtetr :r 
t: i= - -rrs ^e-" ^ n\rpr:red 

devices is detected. This P ^^.^^ ^.^^ ,t xs pr 

— rrrs rserent..;--: other 

and the adapter r rf the Input buffer i 

' --^1 'Xrtrinatrs the processing of all queues 

i \„clated QDIO subchannel, 

the associate 

.or output queues, each -^^^^^^J/; ^/counters a buffer that 

algorithm causes the. F ^^^^ ^ot m t 

cutput queue^ When an outpu^^^_ — ^-:rfrer i: e.ptv, 
" rtat-"- - """st state, the 

^ te rrrsrorsi:::;^-^: — ^^^^^ 

ofaC: ""l/rsrers o::.. the adapter 

30 -he .odel, -hen one or ^ s»e 1/0 buffer that was 

again accesses the S 
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f these states, the adapter 
, detected as being m one o£ „„„ In 

spends processing o. that -^-^/^^^^^...sed and the 

„aln state, the ,,„,i„ing Queue 
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the cost to the ULP 
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The SIGA instruction works almost like a wake-up call, reminding 
the system to go and check its queues and process what is 
pending. It functions as a mid-I/0 intrusion instruction that is 
designated for the checking of the queues. It is an I/O 
5 operational signal structure which in case of its synchronization 
flavor, synchronizes the data in the queues to ensure the state 
information is pushed out and the queues are processed. It can 
be initiated by a program timer if desired. 

In a preferred embodiment of the present invention, the SIGA 
10 comprises an eight bit function code and if called for, a 32 bit 
;^ parameter is transmitted to the adapter. The following is an 
llj example of a SIGA structure. 

in 

jy I. SIGA 
^1 ^J^2^2 _^ _^[S] 

'_ __J .i 

m 0 16 20 31 

General register 0 contains the function code which 
2Cf specifies the operation to be performed by the adapter. General 
register 1 contains the subsystem-identification word, which 
designates a QDIO subchannel by implication and the QDIO adapter 
that is to be signaled. Depending on the specified function 
code, general register 2 contains a 32 bit parameter. The 
25 definition and purpose of this parameter depends on the function 
code. When the function code specifies either (1) initiate- 
output queues, or (2) initiate-input queues, general register 2 
specifies which input or output queues are to be processed by the 
adapter. 
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Function Code 0 / Initiate Output - When the function code 
specifies initiate-output, the associated QDIO adapter is 
signaled to asynchronously process one or more output queues 
associated with the specified subchannel. In this case, the 
5 instruction is referred to as SIGA-w (SIGNAL ADAPTER - write). 
The output queues that are to be processed are specified in 
general register 2 . 

Function code 1 / Initiate Input - When the function code 
specifies initiate-input, the associated QDIO adapter is signaled 
10^ to asynchronously process one or more input queues associated 

with the specified sub-channel. In this case, the instruction is 
lU referred to SIGA-r or Signal Adapter read. The input queues that 
are to be processed are specified in general register 2. 

y Function code 2 / Synchronization - When the function code 

15^ specifies synchronize, the virtual machine is signaled to update 
O the data queues SLSB and SEAL entries in order to render them 
\2 current as observed by both the program and the QDIO adapter. In 
iQ this case, the instruction is referred to as SIGA-s or Signal 
Adapter synchronize. 

20 SIGA-s is required in virtual machine models where QDIO data 

queue sharing between the program and the adapter is simulated by 
the use of separate unshared copies of the queues SLSB and SBAl 
components. One copy of these components is used by the program 
and one copy is used by the adapter. The execution of SIGA-s 

25 signals the virtual machine to update these unshared copies for 
the data queues as necessary so that both the program and the 
QDIO adapter . observe the same contents for these queues 
components . 
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When SIGA-s is specified: 



1) The output queues for the designated subchannel that 
are to be synchronized are specified in general register 2. 

2) All input queues for the designated subchannel are 
synchronized, 

3) The QDIO adapter is not signaled, 

4) The virtual machine is signaled if the program is 
executing in a virtual machine environment. No virtual machine 
signal is generated when the program is not executing in a 
virtual machine. 

For the SIGA-w and SIGA-r and SIGA-s functions, the second 
operand (B2D2) is ignored. 

When the SIGA-r and SIGA-w or SIGA-s functions are 
specified, general register 2 specifies a 32 bit parameter that 
designates which input or output queues are to be processed by 
the adapter. Bits 0 through 31 correspond one for one with input 
or output queues 0 through 31 respectively and are called queues 
indicators QI . Additionally, both input and output queues are 
prioritized by queue number with the lowest numbered queue (queue 
0) having the highest priority and the highest numbered queue 
(queue 31) having the lowest priority. 

When a queue indicator is one and the corresponding queue is 
valid, the QDIO adapter is signaled to process the corresponding 
input or output queues. When a queue indicator is one and the 
corresponding input or output queue is invalid, the queue 
indicator is ignored. 
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A queue is valid when it is established and is active, A 
queue is invalid when it is not established, is not active, or 
the model does not allow a queue to be established for the 
corresponding queue indicator. 

5 When the queue indicator is zero, no action is required to 

be taken at the adapter for the corresponding queues. When all 
queues indicators in general register 2 are zero, the adapter is 
not signaled and no other operation is performed. 



Subsequent to the execution of SIGA, the QDIO adapter 
l(g associated with the designated subchannel performs the specified 
llj function. When the SIGA-w function is specified, the adapter 

processes each specified output queue in priority sequence. For 

III 

jij each queue that contains queue-buffers in the primed state, the 
\n data in the buffers is transmitted and upon completion of 
1^ transmission, the queue buffers are placed into the empty state. 
O This process continues until the data in all primed output queue 
buffers, for all specified output queues, has been transmitted. 

'0 When the SIGA-r function is specified, the adapter processes 

" each specified input queue in priority sequence. For each queue 

20 that contains queue-buffers in the input buffer empty state, data 
is placed into the queue buffers as it is received and upon 
completion of the transmission, the queue buffers are placed into 
the input buffer primed state. This process continues for each 
empty queue buffer in sequence until a buffer that is not in the 

25 input buffer empty state is reached. This process is then 

repeated for the next lower priority input queue. If any queue 
buffers for all specified input queues have been filled with 
data . 
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Shared State Interface Control 

Another important aspect of the present invention is its 
ability to share state interface. The Shared State Interface 
Control or SSIC function that provides shared state interface 
between the QDIO adapter and a QDIO program, such as a multipath 
channel program, can best be described in the following diagram: 



Fill 'n' SEAL'S with data > primed 

set state to multiple 
SEAL'S may be processed 
Issue SIGA to drive the 
adapter 



WRITE 



QDIO Program 



State 



QDIO Adapter 



Process all outbound data 



empty < 



set state to 



Program frees 'empty' 
write buffers after SIGA 



'last ditch' timer will free 



any lingering buffers 
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READ 



QDIO Program State QDIO Adapter 

If required, replace used 
buffers for multiple SBALEs 
within each SEAL 

set state to > empty 



Fill inbound buffers 
for each SBAl used 

primed < set state to 

low traffic - new PCI 
else nothing 

Drain data and pass to ULP, 

Replace all used buffers 

set state to >empty 

II, Store Subchannel QDIO Data or CHSC Command 

Input/output operations for QDIO involve the use of an I/O 
device represented by a subchannel in the channel subsystem. The 
proper execution of QDIO I/O operations depends on certain 
characteristics of the subchannel. Examples of such 
characteristics are: 

o whether the subchannel supports QDIO operations 
o the format of the queues 
o the number of input and output queues 
o I/O-device requirements regarding program issuing of 
the SIGA instruction. 

PO9-99-014 26 



The store-subchannel-QDIO-data command provides the program 
with a way to determine from the channel subsystem the QDIO 
characteristics (listed above) that the program must take into 
account in order to perform I/O operations using a specified 
subchannel. Previous mechanisms that allow programs to determine 
operational characteristics of I/O devices normally consist of 
the program executing a channel program to obtain such 
information from the I/O device. 

By providing the store-subchannel-QDIO-data command, it is 
possible for I/O devices to have different QDIO characteristics 
and for the program to determine what those characteristics are 
prior to communicating with the I/O device itself. 

— 1 The CHSC command isf used to obtain self description 

/information for the QDIo/adapters associated with a specified 

/ range of subchannels. JWhen the CPC is operating in a mode where 
several images are used, the CHSC command is used to obtain self 
description informati^n for the QDIO adapters associated with a 
specified range of smbchannel images, configured to the logical 
partition that execruted the command information for subchannel 
images configured/to other logical partitions, if any, is not 
provided. Figures 5A represents the format for the command 
request block £or store-subchannel-QDIO data. Figure 5B 
represents the/ format for the command response block for the 
store-subcharmel-QDIO data command. In addition. Figure 6 
represents t:he format for Subchannel-QDIO description Block. 

In sliort the CHSC command specifies which device the request 
for processing can be sent to. It further provides for the 
f ormatyand attributes of the QDIO, such as the size and attribute 
of th<B queues, and other characteristics that may relate to the 
specific processor. QFMT or QDIO Queues Format and QDIOAC or 
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QDIO Adapter charac/fceristics in the above figures represent this 
information. IQCNT provides the Input Queues Count and OQCNT 
provides an Output Queue Count, 

III. QDIO Priority Instructions 

5 The user can issue a request leading to a SETUP_REQ 

instruction. When processing this instruction a device address 
will be assigned to the user which will be passed along via a 
SETUP SDU instruction. The SETUP primitive will also pass 
priority queue information to the adapter. The format of this is 
Ifl^ shown in Figure 4. Length is defined by Length of DIF including 
lU this field. Category is defined as the value of primitive 

specific. Type denotes the value of data path device address, 
ill DEV_CUA is a multi-digit CUA in packed format. DEV_NO. refers to 
^fi the device number assigned to this ULP's connection. Priority 
15 Service Order is the order by which the adapter will service the 
j3 queues. It is used to provide a favorable service for higher 
JT priority vs. lower priority queues. Maximum Service Limit Units 
\0 refer to the units that are used under a favored treatment based 
[% on the amount of outbound data allowed to be processed during one 
20 processing interval. It can be defined in three flavors: maximum 
number of packets to be transmitted - counts packet size without 
regard to packet size; maximum^ number of bytes allowed to be 
transmitted; and maximum number of SBALs that may be transmitted 
- without regard to number of packets or amount of data within 
25 the SEAL. Maximum Service Unit Priority provides the number of 
units on a priority basis. 

Data Packing 



Data packing is another important feature that is affected 
by the present invention. As the cost of I/O decreases, the need 
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to prorate traffic to reduce the cost per data element decreases. 
However, the need still exists and the present design will allow 
for a multi-path channel or MPC to perform data packing through 
the device driver code which "unpacks" packed data received from 
the ULPs directly into a Storage_Block_Address_List array so that 
packed format data is not handled directly. This approach is 
taken because packed data resides in slower memory than the 
Storage_Block_Address_Lists array provides. In addition, data 
packing for small objects is supported and non-contiguous headers 
for large objects is supported within a single data queue. In 
this context a non-contiguous header implies the use of a single 
entry for a network or control headers. A preferred ULP to be 
supported is TCP/IP which will build upon existing packing 
algorithms to reduce cost of I/O by continuing to pro-rate the 
cost across multiple datagrams. When an MPC is used, the device 
driver code will unpack the datagrams into the Storage_Block_ 
Address_List arrays. To provide for the efficient flow of large 
data objects, unpacked datagrams will also be supported but the 
criteria upon whether a given flow is to be packed or not depends 
upon the size of the packet. To further optimize the system when 
TCP/IP is used, TCP/IP will include a controller work area, 
preferably a 32 byte header, and the start of the datagram for ^ 
all data transfers. In all cases the controller area, if 
specified, must be provided by the ULP as part of any network or 
control header. This includes single datagram transfers where 
network headers, any control header, any defined data header and 
the user data have been moved to form a continuous bit stream. 
Headers must also be supplied when non-continuous header 
datagrams are used. MPC will not insert the header on behalf of 
the ULP. Note that an SHALE or a Storage_Block_Address_List_ 
Element is also defined, preferably with a 4k page limit to allow 
attachment of the Queued Direct I/O to different switches such as 
fiber optic switches and International Business Machine's ESCON 

PO9-99-014 29 



switch (ESCON is a registered trademark of IBM Corp. of Armonk) . 

Another problem that severely impacts current systems is the 
lack of an efficient gather/scatter function. Since data 
chaining is exposed to the remote partner, it is no longer 
5 efficient for network communications. Yet data movements within 
the server continue to be major performance inhibitors for mid- 
size or large data objects. This problem is resolved by 
inventing an out-of-band header (s) such that the user data need 
not be moved or copied in construction of the data stream. 

1^ The problems with system dispatching is also minimized by 

iU establishing a common user interface such that the user can 

7A assist in dispatch control. When an MPC is used, the MFC will 

lU establish a Direct Queue Area or a DQA for each ULP exploiting 

■fj the network attachment. This area will be used to control the 

13 queuing of inbound data as well as provide the control structure 
to be used for dispatching options and processing. 

The present invention has enhanced the existing system 
support for high performance applications that wish to take 
advantage of high speed media attach. Intent is to minimize 

20 inbound dispatching by providing a set of optional mechanisms 

that bypass the traditional SRB dispatch from disabled code that 
occurs during current I/O disabled completion. Since there is no 
change of ownership required for such protocols such as TCP/IP, 
the recovery procedure will no longer be needed in many 

25 instances. Also, no assigned buffers (ASSIGN BUFFER) are 

required for inbound traffic (TCP/IP) • The data will not be 
blocked by the MPC or multipath channel and the interface layer 
will perform the deblocking function itself. Since MPC is not 
deblocking into smaller datagrams, there is no need for an assign 

30 buffer. The operation is driven by a disable timer during mid- 
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high traffic rates, and all inbound queues for all interfaces 
will be processed via the timer mechanism, and fast interrupt 
indicators will be set off for all read data paths. This in turn 
will eliminate the need for some inbound dispatching functions 
5 like the use of MPC supplied Direct Queue Area. The ULP will 

include a user area for specific processing and the SBAL format 
will include the addresses and lengths of input data. A new 
function, lUTIL CM_ACT is also provided that will contain fast 
dispatching (FAST DISPATCH) which in turn will allow the ULP to 
10 optimize its own environment. 

m Dynamic Configuration 



y In the existing systems, all Gateway-types of attachments 

ry need to have a configuration file defined which identifies 
IL' various items. These items include the following: 

i^. 1) Host Device Address - this definition is needed to 

u define the Host Number and Host Unit address, especially when 

multiple or virtual images/machines are being used when passing 
data across any channel interface. This information is needed by 
the channel subsystem to determine which Host connection is to 
20 receive the incoming data. It is also needed for each Host or 

Host Unit Address which is to be used to transfer data across the 
channel interface to an adapter. 



2) Host Application - This identifies which Host 
Application is using the Host device Address. 

25 3) Application Specific Address - This address is used 

to identify the specific Application Server to which the inbound 

data received from the LAN is to be routed. Each Application 

Specific Address is directly related to the Host Device Address 
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and Host Application, 



4) LAN Port Number - this identifies which LAN Port is 
to be used for sending data which is received at the Gateway from 
the Host Device Address. 

5) Default Routes - these are defined on a Host 
Application basis. Each Host Application can have a default Host 
Device Address specified. This Host Device Address is used to 
send all traffic received from the LAN for a specific Host 
Application for which an Application Specific Address has not 
been defined. For example, if a TCP/IP packet is received from 
the LAN and the TCP/IP address found in the packet was not 
defined in the configuration file, this packet would be sent to 
the Host over the Host Device Address defined by the Default 
Route entry. 

6) Setting Thresholds for Priority Traffic - this 
defines the percentages of processing which should be used on the 
various priority traffic. For example, this command could be 
used to define the maximum number of bytes which should be 
processed for a specific priority before moving on the check for 
work for a different priority. 

The present invention changes all that. All configuration 
information defined above is no longer needed in the 
configuration file. In fact, the configuration file is no longer 
required on the Gateway attachment using the QDIO Interface. All 
the information is presented to the Gateway device at 
initialization time through various tables and commands which are 
passed over the channel interface. 
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A table is provided which maps all the Host images and Host 
Device Addresses which will be using the QDIO Interface to the 
specific bits defined in the SIGA vector. This list is derived 
directly from the information defined in the lOCDS on the Host. 
5 Each entry in the lOCDS which defined an ADIO device causes an 

entry to be placed in the initial table. At initialization time, 
each entry in the table is assigned a specific bit in the SIGA 
vector. Also, at any time after initialization, this information 
can be dynamically changed and Host Device addresses can be added 
10 and/or deleted. 

,Q The Host Application which is to use the Host Device Address 

i"U is defined using a conunand called MPC_ENABLE-IC Command. The 

Application Specific Address is defined using the SETIP command, 
iyi The Application Specific Address can also be deleted using the 
ij^^ DELIP command. The LAN Port Number is specified in the STRTLAN 

Control Command. The Default Routes are defined using the SETRTG 
Jt^ Control Command. This is a new control command defined 
i2 specifically in the present invention. Setting thresholds for 
^0 priority traffic is defined using the SETPRIORITYTHRESHOLD 
2S Control command which defines the maximum number of bytes which 
can be processed for a specific QDIO Priority QUEUE before 
checking for work on the other QDIO Priority QUEUES. This 
coimnand allows the user to tailor each individual system for its 
specific application requirements. 

25 Using this and the queue priority instructions the specific 

algorithm which is to be used when servicing each of the 
different priority queues is addressed. Each Host Device has the 
ability to set its own unique priority algorithm. 
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SIGA Vector Implementation 

The SIGA Vector is needed to give initiative to the QDIO 
connected Gateway device. One problem which is solved by the 
present invention is the use of Priority Queues and how a 
5 priority algorithm which needed to serve multiple priority queues 
at the specified priority values. In other words, certain queues 
represented by the SIGA Vector needed to be completely serviced 
on each invocation because they were the highest priorities. 
Each queue at the next lowest priority needed to have the ability 
10 to have some of its traffic left pending if its thresholds for 
,Q service were reached. The higher priority queues then needed to 
ry be rechecked if more work had come active while the lower 
priority queues still had work pending. 

jfj To accomplish the above task, the SIGA Vector is split into 

i H 

15 a priority bit mask. Each Device Address which was assigned to 
y the QDIO interface had one queue assigned for each of the 

I LI 

\2 possible priorities. In one embodiment of the present invention, 
=0 there are four bits assigned to each of the different Device 
Addresses. When a certain priority work request needs to be 
20 sent, the bit corresponding to the Device Address and its 
corresponding priority is set. As requests come in from 
different priorities or from different Device Addresses, their 
bits would also be set. This gives the Host System the ability 
to five multiple different work requests in the same SIGA Mask. 

25 Another problem addressed is the effective service of 

various QDIO priorities when only a single bit is being used to 
signal the Gateway device work. Since it is possible that all 
the work for a certain priority would not be serviced before 
checking back for more work for the other priorities, the Gateway 

30 device needed to be able to remember the current work, but be 
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able to go back and look for more new work. To do this, the 
Gateway device would write a specific value into the SIGA Vector 
area after each read of the vector. Once the Host code detected 
the value written by the Gateway device, the vector would be 
5 completely cleared and then new work requests were added. 

Clearing of the vector after each read enables the fairness 
algorithms so the different priorities could be processed at 
their desired rates. 

One additional problem to be addressed is the number of bits 
10 which is needed to be scanned to identify the work requests. In 
^~ one embodiment of the present invention, there are a possibility 
lU of 240 Device Addresses. Each Device Address has 4 priorities, 
j-^ so this computes to 4 * 240 or 960 possible bit settings. The 
fij overhead of scanning all these bits to find the work requests is 
1^^ too high. To make the searching faster, the 960 bits are split 

into 30 different 32-bit masks. When a new work request is 
□ added, the bit in one of the 30 different 32-bit masks is set. 
1^ Also, the bit in the Work Vector which corresponds to the 32-bit 
=0 mask in which the bit was set is also set. 

20 The work vector which identified the specific 32-bit mask 

made finding the bits which were set much more efficient. The 
Gateway device can now scan the Work Vector to find the 
appropriate 32-bit mask. The Gateway device can then just fetch 
the proper 32 -bit mask to find the work request. 

25 In one embodiment of the present invention, all high 

priority traffic is handled completely and then the amount of 
data processed from the other queues is assigned a weight using 
the SETPRIORITYTHRESHOLD command. Once the lower priority queues 
have been handled, it is possible some data could be residual in 

30 these queues. It then becomes necessary to go back and check the 
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priority queues if new requests have arrived. To make sure only 
new requests have been added to the List when it is refetched, 
each time the adapter reads the SIGA Vector, it sets a field to 
indicate the vector has been read. The next Host request will 
5 then see the adapter has read the SIGA Vector. It is then 
completely cleared by the Host code before setting the new 
request . 

Error Reporting During Run Time - Non Catastrophic 

^ As data is being transferred across the QDIO interface to 

and from the Gateway device, it is possible for errors to 

fy periodically occur in the data stream. Intermittent errors can 

be recovered. Errors which become persistent need to be detected 

ry so the interface can be taken down and then restarted. All this 

JfJ needs to happen at run time and require no user interventions. 

iji To accomplish this. Error States are defined for the SLSB 

\2 Status Block. When the adapter detects errors in the data 
B stream, an error state is set in the SLSB. The specific reason 
[% for the error is stored in the SBALF (SBAL Flags) which are 

located in the SBAL which is associated with the SLSB that has 
20 the error state set. Using this approach, the Host is able to 

monitor the number of errors which occur within a specified time 
period. If the number of errors exceeds the pre-determined 
threshold which has been set, the QDIO Connection is terminated. 
If the error rate stays under the specific threshold, the 
25 connection will remain active. 

Concurrent Patch 



Concurrent Patch is a feature provided in QDIO. The 
Concurrent Patch feature allows the customer to install a new 
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level of microcode to the adapter without interrupting any of the 
applications and/or services using the adapter. For Channel 
adapters this was not a major problem because all of the 
applications using the channel adapter did not require any 
5 connection-type of information to be kept across the code update. 

For the Network Adapters which are using TCP/IP, the adapter 
contains information about each client station in the LAN and 
each connection which is present with the Host Applications. The 
connections are active once the adapter is activated and remain 
XJX present while the card is active. There are no Gateway platforms 
:fl today which will keep the TCP/IP sessions active during a code 
1^ update. The QDIO Hydranet adapter is the first to offer the 
ul Concurrent Patch feature in a Gateway environment. 

QDIO in Virtual-Machine Environment 

|d The key control mechanism for QDIO is the storage-list-state 

\^ block (SLSB), comprising a vector of state entries for each 
'■D queue, with one entry per storage-block-address list (SEAL). An 
SEAL contains the addresses of a set of storage blocks within 
main memory, the collection of which is termed a buffer, either 
20 input or output. 

Each SLSE entry represents a finite-state machine (FSM), an 
automaton well known in the art, defining the states of a 
computing process, the inputs and outputs of the process for each 
state, and the allowed transitions among the states. Whereas a 
25 standard FSM is executed by a single process, the FSM in an SLSB 
entry in this invention is shared and used as a control and 
communication mechanism by a host program on the one hand and by 
an I/O adapter on the other. The FSM is used by each to drive 
the other. The set of states of the FSM is strictly divided into 
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two subsets, program- owned states and adapter-owned states. The 
ownership is indicated by bits within the encodings of the state- 
values. Each side exchanges ownership with the other to cause 
control and processing to pass between them. 

Thus, the FSM of an SLSB entry embodies two sets (one each 
in the program and the adapter) of one or more processes under 
the control of the FSM definition. These sets of processes are 
kept separate and carefully controlled through the two distinct 
subsets of FSM states, implying ownership by one side or the 
other, as described above. However, within either side (program 
or adapter), multiple processes may share and be controlled by 
the FSM. Such sharing processes within a given side may use the 
state-values within its own side's ownership subset to control 
and communicate with other processes on its own side, but not the 
other side. That is, neither side is permitted to understand or 
act upon the meaning of a specific state-value that is owned by 
the opposite side, other than to transfer ownership according 
to the FSM definition. This strict separation of the program and 
the adapter within the FSM ensures that each side can be a free- 
running process (or set of processes) through the entire set of 
FSMs in an SLSB without the possibility of deadlock. 

Within the preferred implementation, there are separate FSM 
definitions for input and output queues. The five FSM states for 
input queues are as follows: 

* input buffer not initialized (program owned) 

* input buffer empty (adapter owned) 

* input buffer primed (program owned) 

* input buffer error (program owned) 

* input buffer halted (program owned) 
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The five FSM states for output queues in the preferred 
implementation are as follows: 

* output buffer not initialized (program owned) 

* output buffer empty (program owned) 
5 * output buffer primed (adapter owned) 

* output buffer error (program owned) 

* output buffer halted (program owned) 

Figures 7 and 8 depict sample Input and Output queues as 
^ relating to this particular area as will be discussed below. 
IQ With the FSM in each SLSB entry being executed cooperatively but 

independently by the program and the adapter, the processing of 
y an entire input or output queue is accomplished by sequentially 
1^' cycling through the full set of FSMs (and, hence, buffers) within 
Q the SLSB controlling the queue. 

fel The following control mechanisms is an abstract, simplified 

M version of the preferred implementation for the proper sequencing 
through the buffers. 

Output Queues : 

Program 
20 

Current_Entry = 1; 

LOOP: DO WHILE Current_State = ^PRIMED AND output data exists; 
Execute FSM for Current_Entry; 

Current_Entry = Current_Entry + 1 modulo SLSB_Size; 
25 END; 

WAIT (for more data from application or Current_State 
change) ; 
GO TO LOOP; 
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Current_Entry = 1 ; 

LOOP: DO WHILE Current_State = PRIMED; 
Execute FSM for Current_Entry ; 

Current_Entry = Current_Entry + 1 modulo SLSB_Si2e; 
END; 

WAIT (for SIGA-w signal); 
GO TO LOOP; 

Input Queues: 

Program 



Current_Entry = 1; 

LOOP: DO WHILE Current_State = ^ EMPTY; 
Execute FSM for Current_Entry; 

Current_Entry = Current_Entry + 1 modulo SLSB_Size; 
END; 

WAIT (for PCI or timer interruption); 
GO TO LOOP; 

Adapter 



Current_Entry =1; 

LOOP: DO WHILE Current_State = EMPTY AND input data exists 
Execute FSM for Current_Entry; 

Current_Entry = Current_Entry + 1 modulo SLSB_Size; 
END; 

WAIT (for more data, SIGA-r signal, or Current_State 
change) ; 
GO TO LOOP; 
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These control mechanisms (i.e., the FSMs and the sequencing 
logic to loop through the FSMs in an SLSB) keep the program and 
the adapter in synchronism with each other without deadlock as 
the cooperating processes on each side move in tandem through 
5 different portions of the SLSB. The invariant conditions are 

that each side always processes FSM states not processed by the 
other, and as data is moved inbound or outbound, each side sets 
FSM states processed by the other. As long as one side is 
running, it sets states that will be processed by the other 
10 side, and vice versa. 

=0 The QDIO protocol so far described is extended in the 

Iz, present invention to be used in a virtual-machine environment 

y I 

y through minor additions along with careful design and attention 

:^ to the following considerations. 

ij = 

1^ A key aspect of QDIO is the shared-memory model by which the 

^ program and the adapter share a common queue structure and data 
1=^ areas in a computer's main memory. With the free-running 
^ cooperative processes described above, controlled by a set of 
in FSMs in an SLSB for each data queue, the use of shared memory 
2 0 avoids the processor and channel-subsystem overhead of start- 
processing and one-for-one interruptions associated with 
traditional input/output operations. 

Such a shared-memory model is problematic in the environment 
of a virtual machine, which is an image of a real machine created 

25 by a program called a virtual-machine hypervisor. The apparent 
real storage of the virtual machine is in fact pageable storage 
of the hypervisor. The adapter, lacking dynamic-address- 
translation (DAT) capability and the hypervisor 's associated DAT 
tables, needs to know the actual real-storage addresses of the 

30 queue structures and data. 
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The shared-memory model of the QDIO protocol is simulated by 
the virtual-machine hypervisor through the use of "shadow" copies 
of key control blocks that are maintained by the hypervisor. The 
QDIO control-block structure is designed in such a way as to 
5 carefully separate and isolate main-memory addresses from non- 
address information . 

Among the QDIO control blocks, the storage list (SL) and 
storage-block-address list (SEAL) are designed specifically to 
contain addresses needed by the adapter. The queue-information 
hO. block (QIB) and the storage-list-information block (SLIB) are 
=C designed specifically to exclude any such addresses. The memory 
:^ pages containing the QIB and the SLIB are fixed in main memory by 
y the hypervisor and, thus, follow the QDIO shared-memory model: 
the program accesses the QIB and the SLIB using addresses that 
l4 are in fact virtual, while the adapter accesses these same 
control blocks with real addresses. 

1:=^ The SLs and SBALs are shadowed by the hypervisor. The SLSB 

is also shadowed, although it contains no addresses, because of 
its definition as the controlling mechanism for the program's and 

2 0 the adapter's cooperating processes. The changing of FSM states 
in the SLSB controls the program's and the adapter's access to 
the other queue components that require address translation, and 
hence, FSM state-changes must be gated and controlled by the 
hypervisor using the shadow-block mechanism. 

25 The QDIO protocol is started by the existing START 

SUBCHANNEL (SSCH) machine instruction in the preferred 
implementation, but could be started by one or more new 
instructions defined for the purpose. For pageable virtual 
machines, SSCH is intercepted by the hypervisor so as 

30 to begin the simulation of the QDIO protocol. During the 
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simulation of the Establish-QDIO-Queues channel command, the 
hypervisor builds shadow copies of the SL, SEAL, and SLSB control 
blocks. The queue-descriptor record (QDR) associated with the 
Establish-QDIO-Queues command contains the main-memory addresses 
5 of the QDIO queue components as seen by the program. The 

hypervisor translates those addresses, as well as addresses 
within the SL and SBALs, in building its own copy of the QDR 
and the shadow SL and SBALs. Translation of addresses within the 
SBALs may be delayed until the simulation of the Activate-QDIO- 
10 Queues channel command if the program chooses to defer its data- 
p buffer assignments until the queues are activated. 

■5 Once the QDIO protocol is started, the virtual-machine 

y hypervisor needs to intervene to perform address translation 

whenever the program presents a new set of empty or full buffers 
Ifp to the adapter for inbound or outbound data, respectively. The 
^L, hypervisor also intervenes when synchronization is needed between 
the program's original SLSB and the hypervisor 's shadow SLSB used 
M by the adapter. Such address translation and SLSB 
¥ synchronization is implicit during the hypervisor 's 
;g) interception of program-controlled interruptions (PCIs) and the 
SIGA-w and SIGA-r instructions. The SIGA-s instruction causes 
hypervisor intervention when there is no signal needed between 
the program and the adapter in the non-virtual-machine 
environment, but there is nevertheless a need for address 
25 translation and SLSB synchronization for the virtual machine. In 
the preferred implementation, SIGA-s is used by the program when 
recovering emptied outbound buffers from the adapter and after a 
program timer expires to signal the need for checking of SLSB 
states on input queues. 

30 The previously-described FSM definitions and sequencing 

protocols for the SLSB support and make possible the operation of 
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QDIO in virtual machines. The concept of ownership of SBALs and 
data buffers, as embodied in the separate program-owned and 
adapter-owned states of the FSMs, means that the adapter never 
accesses main memory for which the adapter does not have 
5 ownership within the applicable FSM, Ownership is only 
transferred from program to adapter by the setting of an 
adapter-owned state in the FSM by the program and the subsequent 
synchronization of the program's FSM with the adapter's shadow 
FSM by the hypervisor, after the hypervisor performs the 
10 necessary address translation. Likewise, ownership is only 

transferred from adapter to program by the setting of a program- 
=0 owned state in the FSM by the adapter and the subsequent 
;^ synchronization of the real and shadow FSMs, after the hypervisor 
y updates the applicable real SBALs from the shadow SBALs with, for 
is example, the actual data count moved through the adapter. 

The mutually-exclusive FSM-state subsets between the program 
and the adapter, with the rule of each side setting ownership by 

^ the other side to transfer processing between them, enables 

straight forward synchronization of the real and shadow SLSBs by 

2§ the hypervisor. The hypervisor maintains a "hidden shadow" copy 
of the SLSB to reflect the state of the SLSB at the previous 
synchronization point. This permits easy recognition of changes 
made by the program to the real SLSB and by the adapter to the 
shadow SLSB, allowing the proper updates in each direction 

25 between the real and shadow SLSBs with one pass through the 
three copies of the SLSB at each synchronization point. 

The mutually-exclusive FSM-state subsets and the sequencing 
rules through the SLSB entries further support virtualization by 
ensuring that synchronization by the hypervisor does not disrupt 
30 or interfere with concurrent operations by the program and the 
adapter on their respective copies of the SLSB. The boundaries 
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between program-owned and adapter-owned states constantly move 
downward through the SLSB and back to the top. Neither side 
looks beyond its own contiguous set(s) of owned FSMs, with the 
boundaries being apparent. This means the method of 
synchronization by the hypervisor, whether top-down, bottom-up, 
or middle-to-middle in either direction, can have no lasting 
effect of disrupting the program's or the adapter's operation. 

While the invention has been described in detail herein in 
accordance with certain preferred embodiments thereof, many 
modifications and changes therein may be effected by those 
skilled in the art. Accordingly, it is intended by the appended 
claims to cover all such modifications and changes as fall within 
the true spirit and scope of the invention. 
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